智能论文笔记

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Meryem Banu Cavlak , Gagandeep Singh , Mohammed Alser , Can Firtina , Joël Lindegger , Mohammad Sadrosadati , Nika Mansouri Ghiasi , Can Alkan , Onur Mutlu

分类：人工智能 | 机器学习

2022-12-09

Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first fast and widely-applicable pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall filters out all off-target reads before basecalling; and the highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, sensitivity, and generality compared to prior works. We freely open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall.

translated by 谷歌翻译

A Labelled Sample Compression Scheme of Size at Most Quadratic in the VC Dimension

Farnam Mansouri , Sandra Zilles

分类：机器学习

2022-12-24

This paper presents a construction of a proper and stable labelled sample compression scheme of size $O(\VCD^2)$ for any finite concept class, where $\VCD$ denotes the Vapnik-Chervonenkis Dimension. The construction is based on a well-known model of machine teaching, referred to as recursive teaching dimension. This substantially improves on the currently best known bound on the size of sample compression schemes (due to Moran and Yehudayoff), which is exponential in $\VCD$. The long-standing open question whether the smallest size of a sample compression scheme is in $O(\VCD)$ remains unresolved, but our results show that research on machine teaching is a promising avenue for the study of this open problem. As further evidence of the strong connections between machine teaching and sample compression, we prove that the model of no-clash teaching, introduced by Kirkpatrick et al., can be used to define a non-trivial lower bound on the size of stable sample compression schemes.

translated by 谷歌翻译

What do Vision Transformers Learn? A Visual Exploration

Amin Ghiasi , Hamid Kazemi , Eitan Borgnia , Steven Reich , Manli Shu , Micah Goldblum , Andrew Gordon Wilson , Tom Goldstein

分类：计算机视觉

2022-12-13

Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.

translated by 谷歌翻译

Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain

Ali Abdari , Pouria Amirjan , Azadeh Mansouri

分类：计算机视觉

2022-09-29

随着安装摄像头的广泛使用，基于视频的监视方法已引起了针对不同目的（例如辅助生活）的广泛关注。时间冗余和原始视频的巨大大小是与视频处理算法有关的两个最常见的问题。大多数现有方法主要集中于通过探索连续帧来提高准确性，这是费力的，不能考虑实时应用程序。由于视频主要以压缩格式存储和传输，因此在许多设备上都可以使用这些视频。压缩视频包含许多有益信息，例如运动向量和量化系数。正确使用此可用信息可以大大改善视频理解方法的性能。本文提出了一种使用残差数据的方法，该方法直接在压缩视频中可用，可以通过部分解码过程获得。此外，提出了一种积累相似残差的方法，该方法大大减少了处理识别的处理帧数。仅应用神经网络，专门用于压缩域中的累积残留物，可以加速性能，而分类结果与原始视频方法具有很高的竞争力。

translated by 谷歌翻译

Competition, Alignment, and Equilibria in Digital Marketplaces

Meena Jagadeesan , Michael I. Jordan , Nika Haghtalab

分类：机器学习

2022-08-30

众所周知，传统平台之间的竞争可以通过将平台的操作与用户偏好保持一致，从而改善用户实用性。但是，在数据驱动的市场中表现出多大的一致性？为了从理论的角度研究这个问题，我们介绍了一个双重垄断市场，平台动作是强盗算法，两个平台竞争用户参与。该市场的一个显着特征是，建议的质量取决于强盗算法和用户交互提供的数据量。算法性能与用户的动作之间的这种相互依赖性使市场平衡的结构及其在用户公用事业方面的质量复杂化。我们的主要发现是，该市场的竞争并不能完全使市场成果与用户公用事业完全融合。有趣的是，市场成果不仅在平台拥有单独的数据存储库时，而且在平台具有共享数据存储库时表现不对。尽管如此，数据共享假设会影响什么机制驱动未对准的机制，并影响未对准的特定形式（例如，最佳案例和最差的市场成果的质量）。从更广泛的角度来看，我们的工作说明了数字市场中的竞争对用户实用性产生了微妙的后果，值得进一步调查。

translated by 谷歌翻译

HTML版本

Learning in Stackelberg Games with Non-myopic Agents

Nika Haghtalab , Thodoris Lykouris , Sloan Nietert , Alex Wei

分类：机器学习

2022-08-19

我们研究Stackelberg游戏，其中一位校长反复与长寿，非洋流代理商进行互动，而不知道代理商的回报功能。尽管当代理商是近视，非侧心代理会带来额外的并发症时，在Stackelberg游戏中的学习是充分理解的。尤其是，非洋流代理可以从战略上选择当前劣等的行动，以误导校长的学习算法并在未来获得更好的结果。我们提供了一个通用框架，该框架可在存在近视剂的情况下降低非洋白酶的学习来优化强大的匪徒。通过设计和分析微型反应性匪徒算法，我们的还原从校长学习算法的统计效率中进行了差异，以与其在诱导接近最佳的响应中的有效性。我们将此框架应用于Stackelberg Security Games（SSG），需求曲线，战略分类和一般有限的Stackelberg游戏的价格。在每种情况下，我们都表征了近最佳响应中存在的错误的类型和影响，并为此类拼写错误开发了一种鲁棒性的学习算法。在此过程中，我们通过最先进的$ O（n^3）$从SSGS中提高了SSG中的学习复杂性，从通过发现此类游戏的基本结构属性。该结果除了对非洋流药物学习之外，还具有独立的兴趣。

translated by 谷歌翻译

Multi-Stage NMPC for a MAV based Collision Free Navigation under Varying Communication Delays

Andreas Papadimitriou , Hedyeh Jafari , Sina Sharif Mansouri , George Nikolakopoulos

分类：机器人

2022-08-07

通信网络中的时间延迟是通过边缘部署机器人的主要关注点之一。本文提出了一个多阶段的非线性模型预测控制（NMPC），该控制能够处理不同的网络引起的时间延迟，以建立控制框架，以确保无碰撞的无碰撞微型航空车（MAVS）导航。这项研究介绍了一种新颖的方法，该方法通过与现有的典型多阶段NMPC相反的离散化场景树来考虑不同的采样时间，在这种情况下，系统不确定性是由场景树建模的。此外，该方法根据通信链接中时间延迟的概率考虑了多阶段NMPC方案的自适应权重。由于多阶段NMPC，获得的最佳控制动作对于多个采样时间有效。最后，在各种测试和不同的模拟环境中证明了所提出的新型控制框架的总体有效性。

translated by 谷歌翻译

Reactive Navigation of an Unmanned Aerial Vehicle with Perception-based Obstacle Avoidance Constraints

Björn Lindqvist , Sina Sharif Mansouri , Jakub Haluška , George Nikolakopoulos

分类：机器人

2022-07-04

在本文中，我们提出了一种反应性约束导航方案，并避免了无人驾驶汽车（UAV）的嵌入式障碍物，以便在障碍物密集的环境中实现导航。拟议的导航体系结构基于非线性模型预测控制（NMPC），并利用板载2D激光雷达来检测障碍物并在线转换环境的关键几何信息为NMPC的参数约束，以限制可用位置空间的可用位置空间无人机。本文还重点介绍了所提出的反应导航方案的现实实施和实验验证，并将其应用于多个具有挑战性的实验室实验中，我们还与相关的反应性障碍物避免方法进行了比较。提出的方法中使用的求解器是优化引擎（开放）和近端平均牛顿进行最佳控制（PANOC）算法，其中采用了惩罚方法来正确考虑导航任务期间的障碍和输入约束。拟议的新颖方案允许快速解决方案，同时使用有限的车载计算能力，这是无人机的整体闭环性能的必需功能，并在多个实时场景中应用。内置障碍物避免和实时适用性的结合使所提出的反应性约束导航方案成为无人机的优雅框架，能够执行快速的非线性控制，本地路径计划和避免障碍物，所有框架都嵌入了控制层中。

translated by 谷歌翻译

Revisiting Multi-Scale Feature Fusion for Semantic Segmentation

Tianjian Meng , Golnaz Ghiasi , Reza Mahjourian , Quoc V. Le , Mingxing Tan

分类：计算机视觉 | 人工智能

2022-03-23

人们普遍认为，对于准确的语义细分，必须使用昂贵的操作（例如，非常卷积）结合使用昂贵的操作（例如非常卷积），从而导致缓慢的速度和大量的内存使用。在本文中，我们质疑这种信念，并证明既不需要高度的内部决议也不是必需的卷积。我们的直觉是，尽管分割是一个每像素的密集预测任务，但每个像素的语义通常都取决于附近的邻居和遥远的环境。因此，更强大的多尺度功能融合网络起着至关重要的作用。在此直觉之后，我们重新访问常规的多尺度特征空间（通常限制为P5），并将其扩展到更丰富的空间，最小的P9，其中最小的功能仅为输入大小的1/512，因此具有很大的功能接受场。为了处理如此丰富的功能空间，我们利用最近的BIFPN融合了多尺度功能。基于这些见解，我们开发了一个简化的分割模型，称为ESEG，该模型既没有内部分辨率高，也没有昂贵的严重卷积。也许令人惊讶的是，与多个数据集相比，我们的简单方法可以以比以前的艺术更快地实现更高的准确性。在实时设置中，ESEG-Lite-S在189 fps的CityScapes [12]上达到76.0％MIOU，表现优于更快的[9]（73.1％MIOU时为170 fps）。我们的ESEG-LITE-L以79 fps的速度运行，达到80.1％MIOU，在很大程度上缩小了实时和高性能分割模型之间的差距。

translated by 谷歌翻译

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

Golnaz Ghiasi , Xiuye Gu , Yin Cui , Tsung-Yi Lin

分类：计算机视觉

2021-12-22

我们设计了一个开放式图像分割模型，以将图像组织到任意文本指示的有意义区域中。最近的作品（剪辑和对齐），尽管使用图像级字幕标签获得了令人印象深刻的开放式摄氏分类精度，但仍无法用像素分段视觉概念。我们认为这些模型错过了视觉分组的重要步骤，该模型在学习视觉语义对齐之前将像素组织成小组。我们建议OpenSeg解决上述问题，同时仍利用可扩展的图像级标题监督。首先，它学会了为可能的组织提出细分面具。然后，它通过将标题中的每个单词与一个或几个预测的面具对齐来学习视觉语义对齐。我们发现蒙版表示是支持字幕学习图像分割的关键，从而可以扩大数据集和词汇大小。 OpenSeg大大优于pascal数据集上LSEG最近的开放式LSEG +19.9 MIOU的开放式方法。

translated by 谷歌翻译